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We present a general probabilistic perspective on Gaussian filtering and smoothing. This allows 
us to show that common approaches to Gaussian filtering/smoothing can be distinguished solely 
by their methods of computing/approximating the means and covariances of joint probabilities. 
This implies that novel filters and smoothers can be derived straightforwardly by providing 
methods for computing these moments. Based on this insight, we derive the cubature Kalman 
smoother and propose a novel robust filtering and smoothing algorithm based on Gibbs sampling. 

Inference in latent variable models is about extracting information about a not directly observable quantity, 
the latent variable, from noisy observations. Both recursive and batch methods are of interest and referred 
to as filtering respective smoothing. Filtering and smoothing in latent variable time series models, including 
hidden Markov models and dynamic systems, have been playing an important role in signal processing, 
control, and machine learning for decades [12, 15, 3]. 

In the context of dynamic systems, filtering is widely used in control and robotics for online Bayesian 
state estimation [22], while smoothing is commonly used in machine learning algorithms for parameter learn- 
ing [3]. For computational efficiency reasons, many filters and smoothers approximate appearing probability 
distributions by Gaussians. This is why they are referred to as Gaussian filters/smoothers. 

In the following, we discuss Gaussian filtering and smoothing from a general probabilistic perspective 
without focusing on particular implementations. We identify the high-level concepts and the components 
required for filtering and smoothing, while avoiding getting lost in the implementation and computational 
details of particular algorithms (see e.g., standard derivations of the Kalman filter [1, 22]). 

We show that for Gaussian filters/smoothers for (non) linear systems (including common algorithms such 
as the extended Kalman filter (EKF) [15], the cubature Kalman filter (CKF) [2], or the unscented Kalman 
filter (UKF) [11]) can be distinguished by their means to computing Gaussian approximations to the joint 
probability distributions p(xt_i, Xt|zi:t_i) and p(xt, Zt|zi:t_i). Our results also imply that novel filtering 
and smoothing algorithms can be derived straightforwardly, given a method to determining the moments of 
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Figure 1: Graphical model of the dynamic system. The shaded nodes are the measured variables Zj, the un- 
shaded nodes are unobserved variables. The arrows represent probabilistic dependencies between 
the variables. 

these joint distributions. Using this insight, we present and analyze the cubature Kalman smoother (CKS) 
and a filter and an RTS smoother based on Gibbs sampling. 

We start this paper by setting up the problem and the notation, Sec. 1. We thereafter proceed by re- 
viewing Gaussian filtering and RTS smoothing from a high-level probabilistic perspective to derive sufficient 
conditions for Gaussian filtering and smoothing, respectively (Sees. 2 and 3). The implications of this result 
are discussed in Sec. 4, which lead to the derivation of a novel Gaussian filter and RTS smoother based on 
Gibbs sampling. Sec. 5 provides proof-of-concept numerical evaluations for the proposed method for both 
linear and nonlinear systems. Sees. 6-7 discuss related work and conclude the paper. 

1 Problem Setup and Notation 

We consider discrete-time stochastic dynamic systems of the form 

Xt = /(Xt_i) +Wt, (1) 

■Lt = gi^t) + vt , (2) 

where G is the state, zj G is the measurement at time step t = 1,...,T, wj ~ A^(0, Q) is 
i.i.d. Gaussian system noise, vj '-^ A/^(0, R) is i.i.d. Gaussian measurement noise, / is the transition/system 
function and g is the measurement function. The graphical model of the considered dynamic system is given 
in fig. 1. 

The noise covariance matrices Q, R, the system function /, and the measurement function g are assumed 
known. If not stated otherwise, we assume nonlinear functions / and g. The initial state xq of the time 
series is distributed according to a Gaussian prior distribution ;j(xo) = A/'(/iQ, I^g)- The purpose of filtering 
and smoothing is to find approximations to the posterior distributions p(xj|zi:t-), where a subscript 1 : r 
abbreviates 1, . . . , r, with r = t for filtering and t = T for smoothing. 

In this paper, wc consider Gaussian approximations AA(xt | /^^^, S^^) of the latent state posteriors j9(xf|zi;T-). 
We use the shorthand notation a^^ where a = ^ denotes the mean // and a = XI denotes the covariance, b 
denotes the time step under consideration, c denotes the time step up to which we consider measurements, 
and d G {x,z} denotes either the latent space (x) or the observed space (z). 

Let us assume a prior p(xo) = jo(xo|0) and a sequence zi,. . . ,zt of noisy measurements of the latent 
states Xq, . . . ,xr through the measurement function g. The objective of filtering is to compute a posterior 
distribution p(xt|zi:t) over the latent state as soon as a new measurement zj is available. Smoothing extends 
filtering and aims to compute the posterior state distribution of the hidden states Xj, f = 0, . . . , T, given all 
measurements zi, . . . , z^ (see e.g., [1, 22]). 
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2 Gaussian Filtering 

Given a prior p(xo) on the initial state and a dynamic system (e.g., Eqs. (l)-(2)), the objective of filtering 
is to infer a posterior distribution p(xf |zi:() of the hidden state x^, t = 1, . . . ,T, incorporating the evidence 
of the measurements zi-^t- Specific for Gaussian filtering is that posterior distributions are approximated 
by Gaussians [22]. Approximations are required since generally a Gaussian distribution mapped through a 
nonlinear function does not stay Gaussian. 

Assume a Gaussian filter distribution p{xt-i\zi-t-i) = ■^ifJ't_i\^_i,^t_i\t-i) is given (if not, we employ 
the prior p(xo) = p{xq\$) = J\f{yiQ^^, 5^o|0)) °^ initial state. Using Bayes' theorem, the filter distribution 
at time t is 

/ I X P(xt,Zt|zi:t_i) / I X / I ^ /0\ 

p(xt Zi:t) = — — OC p(Zt Xt)p(xt Zi:t_i) . (3) 

p(Zt|Zi:t_i) 

Proposition 1 (Filter Distribution). Gaussian filters approximate the filter distribution p{yit\^i:t) using a 
Gaussian distribution M{iJ,^^,'S^^^). The moments of this approximation are in general computed through 

t^t\t = i^t\t-i + ^ui%-lr\^t - K\t-i) , (4) 

^t\t — ^t\t-i ~ -^tit-iy^tit-i) ^t\t-i- 

Since the true moments of the joint distribution p{^f,'^t\'^i:t-i) can in general not be computed analytically, 
approximations/estimates are used (hence the ^-symbols). 

Proof. Generally, filtering proceeds by alternating between predicting (time update) and correcting [mea- 
surement update) [1, 22]: 

1. Time update (predictor) 

a) Compute the predictive distribution p(xt|zi:t_i). 

2. Measurement update (corrector) 

a) Compute the joint distribution p(xt, Zt|zi:t_i) of the next latent state and the next measurement. 

b) Measure zj. 

c) Compute the posterior p(xt|zi:t). 

In the following, we detail these steps to prove Prop. 1. 

2.1 Time Update (Predictor) 

(a) Compute the predictive distribution p{xt\zi-t-i). The predictive distribution of state x at time t given 
the evidence of measurements up to time i — 1 is 

p(xt|zi:t_i) = Xt_i)p(xt_i|zi:t_i) dxt_i , (6) 

where p(xt|xt_i) = A/'(xt | /(xt_i),Q) is the transition probability. In Gaussian filters, the predictive 
distribution p(xt|zi:(_i) in Eq. (6) is approximated by a Gaussian distribution, whose exact mean and 
covariance are given by 

/X^^_;^:=ExJXt|zi:t_i]=Ex,_i,wt[/(Xt-l) + Wt|zi:t_i] = J /(xt_i)p(xt_i|zi:t_i) dxt_i , (7) 
^t\t-l •= COVxJxt|zi:t_l] = COVxt_i[/(xt_l)|zi:t_l] + COVwt [wt] 

= y"/(xt_i)/(xt_i)^p(xt_i|zi:t_i)dXt_i-/i^j_l(iU^j_l)^+ ^ (8) 



=COVxj_i [/(Xt_l)|zi:t_l] 



=COVwt [Wt] 
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respectively. In Eq. (7), we exploited that the noise term wj in Eq. (1) has mean zero and is independent. 
A Gaussian approximation to the time update p(xj|zi:t_i) is then given by M{xt \ 

2.2 Measurement Update (Corrector) 

(a) Compute the joint distribution 

p(xt,Zt|zi:t_i) =p(zt|xt)p(xt|zi:t_i) . 



(9) 



time update 



In Gaussian filters, a Gaussian approximation to this joint is an intermediate step toward the desired 
Gaussian approximation of the posterior p(xt\zi:t)- If the mean and the covariance of the joint in Eq. (9) 
can be computed or estimated, the desired filter distribution corresponds to the conditional p(xt|zi;t) 
and is given in closed form [3]. 

Our objective is to compute a Gaussian approximation 



^t\t-l ^t\t-l 



(10) 



to the joint p(xi, zj|zi;j_i) in Eq. (9). Since a Gaussian approximation Af{iJ^^^_-^, 5]^^_-^) to the marginal 
p(xt|zi:t_i) is known from the time update, it remains to compute the marginal p(zt\zi;t-i) and the 
cross-covariance := coVxt,zt[^t,^t\^i:t-i]- 

• The marginal p{zt\zi:t-i) of the joint in Eq. (10) is 



p{zt\zi:t-l) = J p(zt|xt)p(xt|zi:t_i)dxt. 



where the state x^ is integrated out according to the time update p(xi|zi;t_i). The measurement 
Eq. (2), yields p(zt|xt) = A/'(g(xt), R). Hence, the exact mean of the marginal is 



A*t|t-1 •= Ezjzt|zi:t_l] = Exj5f(xt)|zi:t_l] = J g(xt) p(xt|zi:f-l) dXj 



(11) 



time update 



since the noise term vj in the measurement Eq. (2) is independent and has zero mean. Similarly, the 
exact covariance of the marginal p(zj|zi:j_i) is 



^t\t-l 



COVzJzt|zi:t_i = COVxj5(xt)|zi:t_i] + COVvJVt] 
J 5(Xt)5(xt)^p(Xt|zi:t_i)dXt-/i^|j_i(^f|j_i)^+ ^ 



(12) 



COVvt [vt] 



=COVxj5(xt)|zi:t_l] 

Hence, a Gaussian approximation to the marginal measurement distribution p{zt\zi:t-i) is given by 

^^{zt\^ltlt-l,^t-l), (13) 

with the mean and covariance given in Eqs. (11) and (12), respectively. 
• Due to the independence of Vf, the exact cross-covariance terms of the joint in Eq. (10) are 

^tjt-l = COVx,,zJxt,Zt|zi:t_i] = Ex(,zt[xtZ^|zi:t_i] - Exjx^ |zi:t_i]EzJzt |zi:t_i]'^ 
= J J Xiz7p(Xt,Zt|zi:t_i)dZtdXt - /i^j_i(/if|j_i)'^ . 
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Plugging in the measurement Eq. (2), we obtain 



-1 = J xt5(xt)^p(xt|zi:t_i) dxt - . (14) 



(b) Measure Zj. 

(c) Compute a Gaussian approximation of the posterior p{xt\zi-t)- This boils down to computing a con- 
ditional from the Gaussian approximation to the joint distribution p(xt, zt|zi:t_i) in Eq. (10). The 
expressions from Eqs. (7), (8), (11), (12), and (14), yield a Gaussian approximation A/'(xt | /i^^, E^^) of 
the filter distribution p(xt|zi:t), where 

K\t = f^t\t-i + m-imt-iVi^t - /.^i,_i) , (15) 

^t\t - ^t\t-l ~ ^t\t-l[^t\t-l) ^t\t-l ■ K^*^) 



Generally, the required integrals in Eqs. (7), (8), (11), (12), and (14) cannot be computed analytically. 
Hence, approximations of the moments are typically used in Eqs. (15) and (16). This concludes the proof 
of Prop. 1. □ 



2.3 Sufficient Conditions for Gaussian Filtering 

In any Bayes filter [22], the sufHcient components to computing the Gaussian filter distribution in Eqs. (15) 
and (16) are the mean and the covariance of the joint distribution p(xt, Zt|zi:t_i). Generally, the required 
integrals in Eqs. (7), (8), (11), (12), and (14) cannot be computed analytically. One exception are linear 
functions / and g, where the analytic solutions to the integrals are embodied in the Kalman filter [12]: 
Using the rules of predicting in linear Gaussian systems, the Kalman filter equations can be recovered when 
plugging in the respective means and covariances into Eq. (15) and (16) [20, 16, 1, 3, 22]. In many nonlinear 
dynamic systems, filtering algorithms approximate probability distributions (see e.g., the UKF [11] and the 
CKF [2]) or the functions / and g (see e.g., the EKF [15] or the GP-Bayes filters [6, 13]). Using the means 
and (cross-)covariances computed by these algorithms and plugging them into Eqs. (15)-(16), recovers the 
corresponding filter update equations for the EKF, the UKF, the CKF, and the GP-Bayes filters. 



3 Gaussian RTS Smoothing 

In this section, we present a general probabilistic perspective on Gaussian RTS smoothers and derive suffi- 
cient conditions for Gaussian smoothing. 

The smoothed state distribution is the posterior distribution of the hidden state given all measurements 

p(xt|zi:T), t = T,...,0. (17) 

Proposition 2 (Smoothing Distribution). For Gaussian smoothers, the mean and the covariance of a 
Gaussian approximation to the distribution p{xt\zi-T) are generally computed as 

f^t-i\T = K-iit-i + Jt-i(AfjT ~ Atjt_i) ) (18) 

^t-i|r = ^t-i|t-i + "^t-iC^tiT ~ '^t\t-i)'^J-i ' (19) 

3t-l = COv[xt_i,Xt|zi;t_i]cOv[xt|zi:t_i]~^ 
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Proof. The smoothed state distribution at the terminal time step T is equivalent to the filter distribution 
p{xt\zi:t) [1, 3]. The distributions p(xt_i|zi:x), t = T, . . . , 1, of the smoothed states can be computed 
recursively according to 

p(xt_i|zi:T) = J p(xt_i|xt,Zi:T)p(xt|zi:T)dxt = J p(xt_i |xt, Zi:t_i)p(xt |zi:t) dxj (21) 

by integrating out the smoothed hidden state at time step t. In Eq. (21), we exploited that xt_i is condi- 
tionally independent of the future measurements zt:T given xj. 

To compute the smoothed state distribution in Eq. (21), we need to multiply a distribution in x^ with a 
distribution in xt-i and integrate over x^. To do so, we follow the steps: 

(a) Compute the conditional p(xt_i|xt, zi:t_i). 

(b) Formulate p(xt_i|xt, zi:^) as an unnormalized distribution in xj. 

(c) Multiply the new distribution with p(xt|zi:T). 

(d) Solve the integral in Eq. (21). 

We now examine these steps in detail. Assume a known (Gaussian) smoothed state distribution p{xt\zi-T)- 

(a) Compute a Gaussian approximation to the conditional p(xt-i\xt,zi-t-i). We compute the conditional 
in two steps: First, we compute a Gaussian approximation to the joint distribution p(xt, xt_i|zi:t_i). 
Second, we apply the rules of computing conditionals to this joint Gaussian. Let us start with a Gaussian 
approximation 



AT 





l^t-llt-l 




( 







"^t-l,t\t-l 
y^t-l,t\t-l) ^t\t-l 



(22) 



to the joint p(x4_i, xt|zi:t_i) and have a closer look at its components: A Gaussian approximation of 
the filter distribution p(xt_i|zi:t_i) at time step t — 1 is known and is the first marginal distribution in 
Eq. (22). The second marginal A^(At^(_]^, S^^_^) is the time update and also known from filtering. To 
fully determine the joint in Eq. (22), we require the cross-covariance matrix 



-i,t\t-i = JJ ^t-ifi^t-iV p{^t-i\zi:t-i) dxt_i - /i^_i|t_i(At?jt_i)^ , (23) 



where we used the means /^f„x|t-i ^^'^ /^tjt-i measurement update and the time update, respec- 

tively. The zero- mean independent noise in the system Eq. (1) does not infiuence the cross-covariance 
matrix. The cross-covariance matrix in Eq. (23) can be pre-computed during filtering since it does not 
depend on future measurements. 

This concludes the first step (computation of the joint Gaussian) of the computation of the desired 
conditional. 

In the second step, we apply the rules of Gaussian conditioning to obtain the desired conditional distri- 
bution p(xt_i|xt,zi:i_i). For a shorthand notation, we define 

Jt-i •= ^t-i,t\t-ii^t\t-i)~^ ' (24) 

and obtain a Gaussian approximation A/"(xt_i | m, S) of the conditional distribution p{yit-i\'^t,zi:t-i) 
with 



m = Hti\t~i + Jt-i(xt - ^^'t\t-l) ' (25) 
i|t-i ~ '^t~i{'^t-i,t\t 



S = ^f-i|t-i - '^t~ii'^t-i,t\t-iV ■ (26) 
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(b) Formulate AA(x(_i | m, S) as an unnormalized distribution in x^. The square-root of the exponent of 
Af{xt-i I m, S) contains 

xt_i - m = r(xt_i) - Jt-ixj 

with r(xt_i) = Xt_i — + Jt_i/i^^_^, which is a Hnear function of both Xt_i and x^. We now 

reformulate the conditional Gaussian A^(xt_i | m, S) as a Gaussian in Jt-ix^ with mean r(xt_i) and the 
unchanged covariance matrix S. We obtain the conditional 

Ar(xt_i I m, S) = ciM{yit | a, A) , (27) 
with ci = -^|27r(J7_iS-iJt_i)-i|/|27rS| , 

and a = J^^^r(x(_i) , A = {jJ_iS^^Jt-i)~^ ■ Note that AA(x(_i | m, S) is an unnormalized Gaussian in 
x^, see Eq. (27). The matrix Jt-i defined in Eq. (24) is quadratic, but not necessarily invertible, in 
which case we take the pseudo-inverse. However, we will see that this inversion will be unnecessary to 
obtain the final result. 

(c) Multiply the new distribution with p(xt|zi:r). To determine p(xt_i|zi:r), we multiply the Gaussian in 
Eq. (27) with the smoothed Gaussian state distribution A/'(xt | /x^^, S^^), which yields the Gaussian 
approximation 

ciAr(xt I a, A)M{yit I iu^r> ^^^jr) = ciC2(a)Ar(xt | b, B) (28) 
of p(xi_i, xt|zi:T), for some b, B, where C2(a) is the inverse normalization constant of Af{xt \ b, B). 

(d) Solve the integral in Eq. (21). Since we integrate over xj in Eq. (21), we are solely interested in the parts 
that make Eq. (28) unnormalized, i.e., the constants ci and C2(a), which are independent of x^. The 
constant C2(a) in Eq. (28) can be rewritten as C2(xt_i) by reversing the step that inverted the matrix 
Jt_i, see Eq. (27). Then, C2(x(_i) is given by 

C2(xt_i) = c^W(xt_i I /x^ii^., ^t-l\T) > (29) 

^^t-l\T = ^^t-l\t-l + ^t-li^^'t\T ~ Mtjt-i) , (30) 

S^_l|T = '^t-l\t-l + J«-l(^tjT ~ '^t\t-l)'^J-l ■ (^1) 
Since cic^^ = 1 (plug Eq. (29) into Eq. (28)), the desired smoothed state distribution is 

p(xt_i |zi:t) = Ar(xt_i I Ht_i\T, S^_i|t) , (32) 

where the mean and the covariance are given in Eq. (30) and Eq. (31), respectively. 

This result concludes the proof of Prop. 2. □ 



3.1 Sufficient Conditions for Smoothing 

After filtering, to determine a Gaussian approximation to the distribution p(xt_i|zi:T) of the smoothed 
state at time t — 1, only a few additional ingredients arc required: the matrix Jt-i in Eq. (24) and Gaus- 
sian approximations to the smoothed state distribution p(x(|zi:x) at time t and the predictive distribution 
j3(xt|zi:t_i). Everything but the matrix 3t-i can be precomputed either during filtering or in a previous 
step of the smoothing recursion. Note that 3t-i can also be precomputed during filtering. 

Hence, for Gaussian RTS smoothing it is sufficient to determine Gaussian approximations to both the joint 
distribution p{xt,zt\zi:t-i) of the state and the measurement for the filter step and the joint distribution 
j3(xt_i, xt|zi:t_i) of two consecutive states. 



7 



Table 1: Computing the means and the covariances of p{yit,Zt\^i:t-i) and p{'x.t-i,^t\^i:t-i)- 





TCftlmaTi ■fnltPT /smnntlnPT' 

X VCA^XXXXCIiXX XXX U V-/X / oxxx\^ w uxxv^x 


EKF /EKS 


UKF/URTSS and CKF/CKS* 


At|t-i 




FAt-i|t-i 




^t\t-i 
^t\t-i 


GS^t-iG"^ + R. 


FS"_i|t-iF ' + Q 
GSt|j_iG + R 


E?fo-^^^(5(xg_,)-M,Vi)^ + R 


^t\t-l 
^t-l,t\t 


-^t-llt-l* 


^t-i|t-iF 


i:?fo-^^^(xg_, - /x^*_i)(5(xg_,) - /x^|,_,)T 



4 Implications and Theoretical Results 

Using the results from Sees. 2 and 3, we conclude that for filtering and RTS smoothing it is sufficient to 
compute or estimate the means and the covariances of the joint distribution p(x(_i, Xt|zi:t_i) between two 
consecutive states (smoothing) and the joint distribution p(xt, zt|zi:f_i) between a state and the subsequent 
measurement (filtering and smoothing). This result has two implications: 

1. Gaussian filters/smoothers can be distinguished by their approximations to these joint distributions. 

2. If there exists an algorithm to compute or to estimate the means and the covariances of the joint 
distributions p(x, /i(x)), where h G {/, 5}, the algorithm can be used for filtering and RTS smoothing. 

In the following, we first consider common filtering and smoothing algorithms and describe how they com- 
pute Gaussian approximations to the joint distributions p(xt_i, X(|zi:t_i) and j9(xj, zt|zi:i^i), respectively, 
which emphasizes the first implication (Sec. 4.1). After that, for the second implication of our results, we 
take an algorithm for estimating means and covariances of joint distributions and turn this algorithm into 
a filter/smoother (Sec. 4.2). 

4.1 Current Algorithms for Computing the Joint Distributions 

Tab. 1 gives an overview of how the Kalman filter, the EKF, the UKF, and the CKF represent the means 
and the (cross-) covariances of the joint distributions p(xt, zj|zi;(_i) and p(xf_i, xt|zi:t_i). In Tab. 1, we 
use the shorthand notation a? := aa^. For example, we defined (/(x|^-|^|^_-^) — At^^_^)^ := (/(xj^-|^|j_-^) — 

M^|,_i)(/(X«i|,_i)-M^|i_i)^. 

In the Kalman filter, the transition function / and the measurement function are linear and represented 

by the matrices F and G, respectively. The EKF linearizes / and g resulting in the matrices F and G, 
respectively. The UKF computes 2D + 1 sigma points X and uses their mappings through / and g to 
compute the desired moments, where Wm and Wc are the weights used for computing the mean and the 
covariance, respectively (see [22], pp. 65). The CKF computations are nearly equivalent to the UKF's 
computations with slight modifications: First, the CKF only requires 2D cubature points X. The cubature 
points are chosen as the intersection of a D-dimensional unit sphere with the coordinate system. Thus, the 
sums run from 1 to 2D. Second, the weights Wc = i/D = Wm arc all equal [2]. 

Although none of these algorithms computes the joint distributions p(xj, zj|zi:j_i) and p(xj_i, xt|zi;t_i) 
explicitly, they all do so implicitly. Using the means and covariances in Fig. 1 in the filtering and smoothing 
Eqs. (4), (5), (18), and (19), the results from the original papers [12, 19, 15, 11, 21, 2] are recovered. To the 
best of our knowledge, Tab. 1 is the first presentation of the CKS. 
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Algorithm 1 Gibbs-RTSS 



1: init: ^(xo), Q, R, /, 5 

2: for t = 1 to T do 

3: infer moments of p(xt_i, xt|zi:t_i) 
4: infer moments of p(xt, Zt|zi:t_i) 
5: measure 

6: compute /^^^, S^i^, Jt_i 

7: end for 

8: for t = T to 1 do 

9: compute At^_i|y, ^t-i\T 
10: end for 




Figure 2: Graphical model for the Gibbs-filter/RTSS. X are data from a joint distribution, ft and S are 
the mean and the covariance of X. The parameters of the conjugate priors on the mean and the 
covariance are denoted by m, S and v, respectively. 

4.2 Gibbs-Filter and Gibbs-RTS Smoother 

We now derive a Gaussian filter and RTS smoother based on Gibbs sampling [9]. Gibbs sampling is an 
example of a Markov Chain Monte Carlo (MCMC) algorithm and often used to infer the parameters of the 
distribution of a given data set. In the context of filtering and RTS smoothing, we use Gibbs sampling for 
inferring the mean and the covariance of the distributions p(xt_i, xt|zi;t_i) and p(xt, zt\zi:t~i), respectively, 
which is sufficient for Gaussian filtering and RTS smoothing, see Sec. 4. 
Alg. 1 details the high-level steps of the Gibbs-RTSS. 

At each time step, we use Gibbs sampling to infer the moments of the joint distributions ^»(xt_i, xt|zi;t_i) 
and p(xt, Zt|zi:t_i). Fig. 2 shows the graphical model for inferring the mean /i and the covariance S from 
the joint data set X using Gibbs sampling. The parameters of the conjugate priors on the mean /x and the 
covariance 5] are denoted by m, S and v, respectively. 

To infer the moments of the joint p(xt_i, xt|zi:t_i), we first generate i.i.d. samples from the filter distribu- 
tion p(xt_i|zi:t_i) and map them through the transition function /. The samples and their mappings serve 
as samples X from the joint distribution p{xt-i,xt\zi-t-i). With a conjugate Gaussian prior A^(/i | m, S) 
on the joint mean, and a conjugate inverse Wishart prior distribution XW(S|^',z/) on the joint covariance 
matrix, we infer the posterior distributions on fj, and By sampling from these posterior distributions, 
we obtain unbiased estimates of the desired mean and the covariance of the joint p{yit-i,'x.t\^i:t-i) as the 
sample average (after a burn in). 

To infer the mean and the covariance of the joint p{xt,'^t\'^i:t-i), we proceed similarly: We generate 
i.i.d. samples from the distribution p{xt\zi-t-i) , which are subsequently mapped through the measurement 
function. The combined data set of i.i.d. samples and their mappings define the joint data set X. Again, 
we choose a conjugate Gaussian prior on the mean vector and a conjugate inverse Wishart prior on the 
covariance matrix of the joint p{xt, zt\zi-t-i)- Using Gibbs sampling, we sample means and covariances from 
the posteriors and obtain unbiased estimates for the mean and the covariance of the joint p(xt, zt|zi:t_i). 

Alg. 2 outlines the steps for computing the joint distribution p(xt, zt|zi:t_i). Since the chosen priors for 



i> initializations 
t> forward sweep (Gibbs-filter) 
> alg. 2 
alg. 2 

> Eqs. (15), (16), (24) 

backward sweep 
> Eqs. (30), (31) 
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Algorithm 2 Inferring the mean fj.^ ^ and the covariance 'Sx,z of p(xt, zj|zi;j_i) using Gibbs sampHng 
1: pass in marginal distribution p(xj|zi:t_i), burn-in period B, number L of Gibbs iterations, size A'' of 
data set 

init. conjugate priors on joint 



X:=[x«,y(x«) 



sample fx^ ~ AA(m, S) 

sample Xli ~ IW{^, v) 

for j = 1 to L do 
update m|X, /i^-, 
update S|X, yitj, 5]^ 
sample Atj+i ~ A/'(m, S) 
update *|X, 5]j 
update i^|X, /ij+i, 
sample Sj+i ~ I>V(*, i^) 

end for 

return IJ,^^^,^x,z 



mean and covariance J\f{iJ,^^^ \ m, S) and IW{'Sx,z\^, ^) 

> generate joint data set 



> for L Gibbs iterations do 
> posterior parameter (mean) of p^J-j) 
> posterior parameter (covariance) of p{Hj) 
\> sample mean of the joint 
> posterior hyper-parameter (scale matrix) of pC^j) 
> posterior hyper-parameter (degrees of freedom) of p(Sj) 

> sample covariance of the joint 

o unbiased estimate of the mean of the joint distribution 
> unbiased estimate of the covariance of the joint distribution 
> return inferred mean and covariance of the joint 



the mean and the covariance are conjugate priors, all updates of the posterior hyper-parameters can be 
computed analytically [10]. 

The moments of p(xt_i, xt|zi:t_i), which are required for smoothing, are computed similarly by exchanging 
the pass-in distributions and the mapping function. 

5 Numerical Evaluation 

As a proof of concept, we show that the Gibbs-RTSS proposed in Sec. 4.2 performs well in linear and 
nonlinear systems. As performance measures, we consider the expected root mean square error (RMSE) 
and the expected negative log-likelihood (NLL) per data point in the trajectory. For a single trajectory, the 
NLL is given by 

= "tTT £ logAr(xr'lM^., i^lf) , (33) 

t=o 

where t = t for filtering and t = T for smoothing. While the RMSE solely penalizes the distance of the 
true state and the mean of the filtering/smoothing distribution, the NLL measures the coherence of the 
filtering/smoothing distributions, i.e., the NLL values are high if x\^^^^ is an unlikely observation under 
p{xt\lJ-^^, (cr^^)^), r G {t,T}. In our experiments, we chose a time horizon T = 50. 

5.1 Proof of Concept: Linear System 

First, we tested the performance of the Gibbs-filter/RTSS in the linear system 

xt = xt-i + wt (34) 
zt = -2xt + vt, (35) 

where wt ^ Af{0,l),vt ^ Af{0,lO),p{xo) = M{0,5). In a linear system, the (E)KF is optimal and unbi- 
ased [1]. The Gibbs-filter/RTSS perform as well as the EKF/EKS as shown in Fig. 3, which shows the 
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EKF 


Gibbs-filter* 


EKS 


Gibbs-RTSS* 


RMSE 
NLL 


1.11 ±0.014 
1.52 ±0.012 


1.12 ±0.014 
1.52 ±0.012 


0.88 ±0.011 
1.30 ±0.013 


0.89 ±0.011 
1.30 ±0.012 



Figure 3: Expected performances (linear system) with standard error of the mean. The results obtained 
from the optimal linear algorithm and the Gibbs-filter/RTSS are nearly identical. 



filters 


Gibbs-filter* 


EKF 


CKF 


UKF 


RMSE 
NLL 


5.04 ±0.088 
2.87±0.12 


11.1 ±0.29 
26.1 ± 1.18 


6.18 ±0.17 
9.96 ±0.75 


8.57 ±0.16 
13.6 ±0.68 


smoothers 


Gibbs-RTSS* 


EKS 


CKS'^ 


URTSS 


RMSE 
NLL 


4.01 ±0.085 
2.78 ±0.15 


10.6 ±0.28 
90.6 ± 10.3 


5.66 ±0.20 
28.9 ±3.31 


8.02 ±0.16 
16.3 ±0.16 



Figure 4: Expected performances (nonlinear system) with standard error of the mean. The Gibbs-RTSS is 
the only coherent smoother, i.e., it improves the filtering results in the NLL measure. 

expected performances (with the corresponding standard errors) of the filters/smoothers over 100 indepen- 
dent runs, where xq ~ p(xq). The Gibbs-sampler parameters were set to (N, L, B) = (1000, 200, 100), Alg. 2. 



5.2 Nonlinear System: Non-stationary Growth Model 

As a nonlinear example, we consider the dynamic system 

xt = ^ + ^^^+Scos(l.2(t-l)) + wt, (36) 

2 

^t = %+vt, (37) 

with exactly the same setup as in [8]: wt ~ A/'(0, 1), ~ A/'(0, 10), and p(xq) = A/'(xo|0, 5). This system is 
challenging for Gaussian filters due to its quadratic measurement equation and its highly nonlinear system 
equation. 

We run the Gibbs-RTSS, the EKS, the CKS, and the URTSS [21] for comparison. We chose the Gibbs 
parameters (N,L,B) = (1000,200,100). For 100 independent runs starting from xq ~p(xo), we report the 
expected RMSE and NLL performance measures in Fig. 4. 

Both high expected NLL-values and the fact that smoothing makes them even higher hint at the inco- 
herencies of the EKF/EKS, the CKF/CKS, and the UKF/URTSS. The Gibbs-RTSS was the only considered 
smoother that consistently improved the results of the filtering step. Therefore, we conclude that the Gibbs- 
filter/RTSS is coherent. 

Fig. 5 shows example realizations of filtering and smoothing using the Gibbs-filter/RTSS, the EKF/ 
EKS, the CKF/CKS, and the UKF/URTSS, respectively. The Gibbs-filter/RTSS appropriately inferred the 
variances of the latent state while the other filters /smoothers did not (neither of them is moment-preserving) , 
which can lead to incoherent filtering/smoothing distributions [5], see also Fig. 4. 

6 Discussion 

Our Gibbs-filter/RTSS differs from [4], where Gibbs sampling is used to infer the noise in a linear system. 
Instead, we infer the means and covariances of the full joint distributions p(xf_i, xt|zi:j_i) and p(xt, zt\zi-t-i) 
in nonlinear systems from data. Neither the Gibbs-filter nor the Gibbs-RTSS require to know the noise 
matrices R, Q, but they can be inferred as a part of the joint distributions if access to the dynamic system 
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(a) Gibbs-filter (Gibbs-RTSS). RMSE: 5.56 (4.18), NLL: 2.65 
(2.45). 
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(b) EKF (EKS). RMSE: 11.3 (14.7), NLL: 16.5 (20.8). 



CD 



30 
20 

10 


S -10 

i-20 
-30 
-40 











AA 


CKF 

— CKS 

— -ground truth 

























10 



20 30 
time steps 



40 



50 



(c) CKF (CKS). RMSE: 5.66 (5.96), NLL: 7.32 (20.7). 
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(d) UKF (URTSS). RMSE: 7.87 (7.18), NLL: 8.66 (9.93). 



Figure 5: Example trajectories of filtering/smoothing in the nonlinear growth model using (a) Gibbs-RTSS, 
(b) EKS, (c) CKS, (d) URTSS. The filter distributions are represented by the shaded areas (95% 
confidence area), the smoothing distributions are shown by solid green lines (95% confidence area). 
The actual realization of the latent state is the dashed red graph. 

is given. Unlike the Gaussian particle filter [14], the proposed Gibbs-filter is not a particle filter. Therefore, 
it does not suffer from degeneracy due to importance sampling. 

Although the Gibbs-filter is computationally more involved than the EKF/UKF/CKF, it can be used 
as a baseline method to evaluate the accuracy and coherence of more efficient algorithms: When using 
sufficiently many samples the Gibbs-filter can be considered a close approximation to a moment-preserving 
filter in nonlinear stochastic systems. 

The sampling approach to inferring the means and covariances of two joint distributions proposed in this 
paper can be extended to infer the means and covariances of a single joint, namely, p{xt-i,xt,zt\zi-t-i)- This 
would increase the dimensionality of the parameters to be inferred, but it would remove slight inconsistencies 
that appear in the present approach: Ideally, the marginals p(xt|zi:j_i), i.e., the time update, which can be 
obtained from both joints p(xt-i,xt\zi-t-i) and p(xt, zt|zi:j_i) are identical. Due to the finite number of 
samples, small errors are introduced. In our experiments, they were small, i.e., the relative difi^erence error 
was smaller than 10~^. Using the joint p(xi_i, xt, zt|zi:t_i) would avoid this kind of error. 

The Gibbs-filter/RTSS only need to be able to evaluation the system and measurement functions. No 
further requirements such as differentiability are needed. A similar procedure for MCMC-based smoothing 
is applicable when, instead of Gibbs sampling, slice sampling [18] or elliptical slice sampling [17] is used, 
potentially combined with GPs that model the functions / and g. 

The Gibbs-RTSS code is publicly available at mloss . org. 

In the context of Gaussian process dynamic systems, the GP-EKF, the GP-UKF [13], and the GP-ADF [6] 
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can directly be extended to smoothers using the results from this paper. The GP-URTSS (smoothing 
extension of the GP-UKF) and the GP-RTSS (smoothing extension of the GP-ADF) are presented in [5]. 

7 Conclusion 

Using a general probabilistic perspective on Gaussian filtering and smoothing, we first showed that it is 
sufficient to determine Gaussian approximations to two joint probability distributions to perform Gaus- 
sian filtering and smoothing. Computational approaches to Gaussian filtering and Ranch- Tung-Striebel 
smoothing can be distinguished by their respective methods used to determining two joint distributions. 

Second, our results allow for a straightforward derivation and implementation of novel Gaussian filtering 
and smoothing algorithms, e.g., the cubature Kalman smoother. Additionally, we presented a filtering 
smoothing algorithm based on Gibbs sampling as an example. Our experimental results show that the 
proposed Gibbs- filter /Gibbs- RTSS compares well with state-of-the-art Gaussian filters and RTS smoothers 
in terms of robustness and accuracy. 
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